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Abstract. We show that an arbitrary probability distribution can be represented in 
exponential form. In physical contexts, this implies that the equilibrium distribution 
of any classical or quantum dynamical system is expressible in grand canonical form. 
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Exponential families of probability distributions play central roles in information 
theory [l], statistics [2], and statistical mechanics [3]. Thus, there arises the interesting 
question of whether a given system of probability distributions admits a representation 
in exponential form. Recent work has shown that, in the case of finite-dimensional 
quantum systems, the time average of the density matrix can be expressed as a grand 
canonical state, which assumes an exponential form Motivated by this result, in the 
present paper we derive a general theorem stating that an arbitrary system of discrete 
or continuous probability densities admits a representation in the form of an exponential 
family. This is surprising in that even power-law distributions are thereby representable 
in exponential form. 

The paper is organised as follows. We first establish the result for discrete and 
finite probability densities. An example of this result has been demonstrated in [3] ; the 
purpose here is to provide a simpler derivation of the general result. We then proceed 
to consider the exponential representation for an arbitrary smooth positive probability 
density function 7r(a;), and show that an expression of the form ti{x) = exp (— J2k Pkx'') 
is always possible. 

1. We begin our analysis in the case of a finite-dimensional discrete probability 
distribution. Let H he a random variable assuming distinct values {-Ei}i=o,i....n with 
probabilities {vrj}j=o,i,...n- Then, there is a linearly independent family of n random 
variables, including H itself, such that any of these random variables can be expressed 
as a function of H. There is a freedom in the choice of the family; here, for simplicity, 
we choose the powers of H; thus, our family of independent random variables is just 
the set {1, H, H^, H^, ■ ■ ■ , H"^}. The linear independence of these random variables, i.e. 
the fact that the matrix of powers {-Ef } is nonsnigular, follows from the elementary fact 
that an n-th order polynomial vanishing at n+1 distinct points must be identically zero. 
Moreover, the powers for all m > n are obviously expressible as linear combinations 
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of powers {H''}k=o,i,...n- We define the moments {fik}k=o,i,...n of H by 

n 

/i, = 5^7r,Ef, (1) 

1=0 

where /iq = 1. To estabhsh the existence of an exponential representation for {vTj} 
two further ingredients are needed; the first is the logarithmic entropy of Shannon and 
Wiener, defined by 

n 

5=-^7r,ln7r,. (2) 

i=0 

The second is the family of variables {Pk}k=i,...n conjugate to the moments {^k} with 
respect to the entropy S in the sense that 

= (3) 
onk 

We then have the following result: 

Proposition 1 The family of probabilities {7rj}j=o,i,...n introduced above can be expressed 
in the exponential form 

TT, = exp J2 f^kEf - In Z(/3) j , (4) 

where Z{/3) = Elo e^P (- ELi PkE^) ■ 

Since the matrix {E^} is nonsingular, equations ([1]) can be solved to express the 
{vTi} as linear functions of the moments {fJ^k}- That is, we can write 

n 

^Xi = Y^c^j^lj, (5) 

where the constant coefficient matrix {cj^} is just the inverse of the matrix {-Ef }. Since 
the entropy is a concave function of the moments {/Xfc}, the conjugate variables {[3k} 
introduced in ^ are in one-to-one correspondence with {jik}- In other words, ([2D 
defines a Legendre transform [5]. Thus, in principle we can express the moments {nk} 
in terms of the conjugate variables {(3k}, substitute the results in (jS]), and express the 
probabilities {vTj} in terms of the variables {j3k}- The proposition above states that 
the result of this nonlinear transform can be expressed analytically, and is given by an 
exponential family of distributions. 

Proof. Since the row vectors \i) = {Ef,El,Ef,---,E^) for i = 0,1,..., n are 
linearly independent, we can express the vector — IutTj G M"+^ in the form 



-ln7r, = 5^/?fcEf (6) 

fc=0 

for some coefficients {Pk}- Substituting (El) in ([2]), we obtain 

n 

S = J2(^kfik, (7) 



k=0 
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from which we deduce ([2]) a posteriori. Finally, solving for tTj we obtain the desired 
form ([5]), where the normalisation condition for {tTj} implies that /5o = lnZ(/3). □ 
By the above result, the nonlinear transform ([3]) can be inverted analytically in the 

form 



n 



= J^Efexp -^Ai5;n. (8) 

1=0 \ 1=0 J 

2. An exponential representation can also be derived in the case of an arbitrary 
smooth probability density function. In the continuous case, however, the moments of 
the distribution need not exist in general. Therefore, some of the preceding constructions 
involving entropy and moments must be altered. We state the main result first: 

Proposition 2 Let 7t{x) be a probability density function on the real line such that 
ln7r(x) is quadratically integrable with respect to the Gaussian measure e~^^dx. Then 
n{x) can be expressed in the exponential form 

vr(x) = exp J2 f^kx" - In Z(/3) j , (9) 

where Z{(3) = exp (— Ylt=i Pkx'^) dx, and where the value ofn may be infinite. The 
parameters are uniquely determined by 7r(x). 

The statement of Proposition 2 is perhaps surprising, because the representation 
(j9]) applies, for example, to power-law distributions such as the Cauchy distribution 
l/[7r(l + x^)] for which none of the moments exists. The proof goes as follows. 

Proof. Since by assumption ln7r(a;) is quadratically integrable with respect to the 
Gaussian measure, one can expand ln7r(a;) G £^(M, e~^'^da;) in terms of the Hermite 
polynomials {Hk{x)}, that is, 

iTiT^ix) = -Y,lkHk{x), (10) 



where 



oo 

2 



oo 



The infinite series in the right side of flTUl) converges almost everywhere, since the squared 
Hilbert space norm 7^ converges by assumption. Next, define a set of numbers {/3fc} 
by the prescription 

J2Pkx' = J2^kHk{x). (12) 

k k 

Substituting this in ffTOl) and solving the result for n{x), we deduce ([9]), where the 
normalisation condition implies that /5o = InZ. □ 
Can we establish a relation analogous to ([3]) for a general probability density 
function? To this end we introduce what might appropriately be called the 'Gaussian 
moments' of tt{x) by defining 

/■oo 

Hk= x''7r{x)e~'''dx. (13) 
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Similarly, we define the 'Gaussian entropy' of '7r(x) by 

/oo 
7r(a;)ln7r(x)e-^'dx. (14) 
-oo 

The coefficients {Pk} appearing in ([9]) are then related to the Gaussian moments {fik} 
defined by f|T3|) via relation ([3]), provided we use the Gaussian entropy f|T^ . 

The family of density functions for which ln7r(x) is quadratically integrable with 
respect to the Gaussian measure is fairly large and includes, in particular, all the 
power-law distributions. However, this family is not exhaustive. Nevertheless, the 
representation ([9]) can be established for a much wider class of density functions. The 
idea is to extend the formulation based on the Gaussian measure into the class & of 
positive Schwartz functions (by this we mean functions that have infinite numbers of 
derivatives, each of which decays faster than any inverse polynomial). This class forms 
a convex cone which includes, in particular, the Gaussian function e~^^ . Let s{x) G & 
be a positive Schwartz function such that s{x)\ia'7i{x) is quadratically integrable with 
respect to the Lebesgue measure. We then construct orthonormal polynomials {Jk{x)} 
in £^(M, s'^{x)dx) by means of the Gram-Schmidt process. Approximating by integration 
over a finite interval, we can then apply the Weierstrass approximation theorem to 
establish the completeness of the set {Jk{x)}. The function ln7r(a;) can therefore be 
expanded in a form analogous to ffTOj) . with almost everywhere convergence. The 
coefficients {7^} depend upon the choice of the Schwartz function s(x), whereas the 
expansion coefficients {Pk} defined in a manner analogous to ( fT2i) are basis independent. 

To show that the representation ([9]) is valid for all smooth density functions we 
proceed as follows. First, we observe that since 7r(x) is nonnegative, [1 + (ln7r(a;))^]~^ is 
less than or equal to one for all x. Therefore, the function f{x) = s{x)/[l + (ln7r(x))^], 
for any s{x) G &, decays faster than any inverse polynomial. Thus, for an arbitrary 
smooth density function 7r(x), the logarithm ln7r(x) is by construction quadratically 
integrable in £^(R, /(x)dx). Of course, the density function could be so perverse that 
f{x) does not belong to &, i.e. the derivatives of f{x) need not decay faster than any 
inverse polynomial. However, the behaviour of these derivatives is immaterial for our 
construction, since we merely require that all polynomials are quadratically integrable 
with respect to the measure f{x)dx. Consequently, the above exponential representation 
is indeed valid for all smooth density functions. 

In statistics, the exponential family of distributions is generally defined as the 
totahty of density functions that admit representations of the form exp(— ^^^g PkTk{x)) 
for a set of functions (sufficient statistics) {Tfc(x)}, where n is usually assumed finite. 
The foregoing result thus implies that the exponential family of distributions is dense in 
the totality of probability distributions. Thus, the study of probability distributions 
could, in principle, be restricted to the exponential type. In specific applications, 
the practicality of this depends upon the density function 7t{x) and the choice of the 
Schwartz function s{x), since the rate of convergence depends upon these ingredients. 

From the physical point of view, the result established here also leads to an 
interesting observation concerning equilibrium properties of generic dynamical systems. 
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We note that if a dynamical system is in equilibrium, then the associated equilibrium 
distribution is necessarily an energy distribution, since steady state solutions to the 
Liouville equation (or the Heisenberg equation in the case of a quantum system) are 
given by functions of the Hamiltonian. Thus, we conclude that if a dynamical system 
is in equilibrium, then the relevant equilibrium distribution is necessarily expressible 
in grand canonical form. We emphasise that this result applies not only to thermal 
equilibrium, but to any form of equilibrium state of a dynamical system. 
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